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ITERATIVE DECODING 



This nonprovisional application claims the benefit of U.S. provisional 
application No. 60/174,601 entitled "Map Decoding In Channels With Memory" filed on 
January 5, 2000. The Applicant of the provisional application is William Turin (Attorney 
Docket No. 105038). The above provisional application is hereby incorporated by 
reference including all references cited therein. 

BACKGROUND OF THE INVENTION 

1. Field of Invention 

This invention relates to iterative decoding of input sequences. 

2. Description of Related Art 

Maximum a posteriori (MAP) sequence decoding selects a most probable 

information sequence X^= (Xj, Xj, X^) that produced the received sequence Y,^ = 
(Yi, Y2, Yj). For transmitters and/or channels that are modeled using Hidden Markov 

Models (HMM), the process for obtaining the information sequence Xj that corresponds 
to a maximum probability is difficult due to a large number of possible hidden states as 

well as a large number of possible information sequences X^ . Thus, new technology is 
needed to improve MAP decoding for HMMs. 

SUMMARY OF THE INVENTION 
This invention provides an iterative process to maximum a posteriori (MAP) 
decoding. The iterative process uses an auxiliary function which is defined in terms of a 
complete data probability distribution* The MAP decoding is based on an expectation 
maximization (EM) algorithm which finds the maximum by iteratively maximizing the 
auxiliary function. For a special case of trellis coded modulation, the auxiliary function 
may be maximized by a combmation of forward-backward and Viterbi algorithms. The 
iterative process converges monotonically and thus improves the performance of any 
decoding algorithm. 

The MAP decoding decodes received inputs by minimizing a probability of error, 
A direct approach to achieve this minimization results in a complexity which grows 
exponentially with T, where T is the size of the input. The iterative process avoids this 
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complexity by converging on the MAP solution through repeated use of the auxiliary 
function. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention is described in detail with reference to the following figures where 
like numerals reference like elements, and wherein: 

Fig. 1 shows a diagram of a conununication system; 

Fig. 2 shows a flow chart of an exemplary iterative process; 

Figs. 3-6 show state trajectories determined by the iterative process; 

Fig. 7 shows an exemplary block diagram of the receiver shown in Fig. 1 ; 

Fig. 8 shows a flowchart for an exemplary process of the iterative process for a 
TCM example; and 

Fig. 9 shows step 1004 of the flowchart of Fig, 8 in greater detail. 
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

Fig. 1 shows an exemplary block diagram of a communication system 100. The 
communication system 100 includes a transmitter 102, a channel 106 and a receiver 104. 

The transmitter 102 receives an input information sequence (i.e., I^, I2, Ij) of length 
T, for example. The input information sequence may represent any type of data including 
analog voice, analog video, digital image, etc. The transmitter 102 may represent a 
speech synthesizer, a signal modulator, etc.; the receiver 104 may represent a speech 
recognizer, a radio receiver, etc.; and the channel 106 may be any medium through which 

the information sequence (i.e., X^, Xj, Xj) is conveyed to the receiver 104. The 

transmitter 102 may encode the information sequence and transmit encoded 

information sequence Xj^ through the channel 106 and the receiver 104 receives 

information sequence Y^^ (i.e., Yj, Y2, Yj), The problem in communications is, of 

course, to decode in such a way as to retrieve I^ . 

Maximum a posteriori (MAP) sequence decoding is a technique that decodes the 

received sequence Y/ by niinimizing a probability of error to obtain X^ (and if a model of 

the transmitter 102 is includ^, to obtain I^ ). In MAP, the goal is to choose a most 
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probable that produces the received . The MAP estimator may be expressed by 
equation 1 below. 

=argmaxPr(X^,Y,^) (1) 

where Pr(-) denotes a corresponding probability or probability density function and is 
an estimate of . Equation 1 sets X^ to the Xj that maximizes Pr(X[ , ). 

ThePr(x^,Yi^) term may be obtained by modeling the channel 1 06 of the 
communication system 100 using techniques such as Hidden Markov Models (HMMs). 
An input-output HMM X = (S,X,Y,n,{P(X,Y)}) is defmed by its internal states S = {1,2, 
, . .n}, inputs X, outputs Y, initial state probability vector and the input-output 
probability density matrices (PDMs) P(X,Y), XeX, YgY. The elements of P(X,Y), 
Pij(X,Y) = Pr(j,X,Y I i), are conditional probability density functions (PDFs) of input x 
and corresponding output y after transferring from the state i to state j. It is assimied that 

the state sequence So = (So? Sj, SJ, input sequence X| = (XuXjv-Xt), and output 

sequence = (Yi,Y2,. . .,Yt) possess the following Markovian property 

Pr(S„X,,Y,|sr^Xr^Y;-^) = Pr(S,,X„Y,|S,.,). 

Using HMM, the PDF of the input sequence X^ and output sequence Y,^ may be 
expressed by equation 2 below: 

T 

PT(X^Yi^)=7tnP(Xi,Yi)l (2) 

where 1 is a column vector of n ones, 7C is a vector of state initial probabilities, and n is a 
nimiber of states in the HMM. Thus, the MAP estimator when using HMM may be 
expressed by equation 3 below: 



X, =argmax 



n riP(Xi,Yi)l 



(3) 



i=l 

The maximization required by equation 3 is a difficult problem because all 
possible sequences of Xj must be considered. This requirement results in a complexity 
that grows exponentially with T, This invention provides an iterative process to obtain 
the maximum without the complexity of directly achieving the maximization by 
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evaluating equation 2 for all possible , for example. In the iterative process, an 
auxiliary function is developed whose iterative maximization generates a sequence of 
estimates for approaching the maximum point of equation 2. 

The iterative process is derived based on the expectation maximization (EM) 
algorithm. Because the EM algorithm converges monotonically, the iterative process 
may improve the performance of any decoding algorithm by using its output as an initial 
sequence of the iterative decoding algorithm. In the following description, it is assumed 
that HMM parameters for the channel 106 and/or the transmitter 102 are available either 
by design or by techniques such as training. 

The auxiliary function may be defined in terms of a complete data probability 
distribution shown in equation 4 below. 

T 

T(z,X^Y,^)=7Ci nPi.,i.(X„Y.), (4) 

where z = ij is an HMM state sequence, 7C is an initial probability vector for state io, 

and Pij(X,Y) are the elements of the matrix P(X,Y). The MAP estimator of equation 1 
can be obtained iteratively by equations 5-9 as shown below. 

Xjp+i == argmaxQ(X^,X?^p), p = 0,1,2,... (5) 

where p is a number of iterations and Q( X[ , X| p ) is the auxiliary function which may be 
expressed as 

Q(X^X;^p) = X^(2.X?;p,Y,^)log(^(z,X^Y,^)). (6) 

z 

The auxiliary function may be expanded based on equation 4 as follows: 

Q(X^X;,)=X X 2 y.,j(Xjp) log(Pij(X„ Y^) + C (7) 

t-1 i-I j=l 

where C does not depend on Xj , n is a number of states in the HMM and 

YuKp) = ai(XS, Yr')py(X,^, Y.) (8) 

where ai(X; „ , Y,' ) and PjC^Xj;, „ , Y^,, ) are the elements of the following forward and 
backward probability vectors 
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T T 

a( X| , Y,' ) = 71 n P(Xi , ) , and p( X^.^ , Y^, ) = H P(Xi,YJl. (9) 

i=l i=t+l 

Based on equations 5-9, the iterative process may proceed as follows. At p=0, an 
initial estimate of X^Q is generated. Then^ Q(XJ^jX^q) is generated for all possible 

sequences ofXj, From equations 7 and 8, Q(X J^Xj^) may be evaluated by generating 

yt y(Xj q) and log (PyCXt^Yt)) for each t, i, and j. Yyj(Xj q) may be generated by using the 
forward-backward algorithm as shown below: 

aiXlX)-^^ a(X|,,,Y/) = a(Xt;,Y/-^)P(X,p,Y,) ,t=l,2,..,T 

P( XL,p . Y,\, P( Xl , Y,^ P(X,p , Y, ) P( X[,,^ , Yl, ) ,t=T- 1 ,T-2, . . . , 1 
Log (Pij(Xt,Yt)) is generated for all possible X^ for t=l, 2, T and the QQs that 
maximize D(Xj^,X^o)^^ selected asX^j , After Xj^ is obtained, it is compared with 
X^o • If ^ measure D(Xj^j ,X^o ) of difference between the sequences exceeds a compare 
threshold, then the above process is repeated imtil the difference measure D(X^p,Xj^p.i) is 
within the threshold. The last X^p for p iterations is the decoded output. The measure of 

difference may be an amoimt of mismatch information. For example, if X^ is a sequence 
of symbols, then the measure may be a number of different symbols between X^ p and 

X^p.j (Hamming distance); if X^ is a sequence of real numbers, then the measure may be 

an Euclidean distance D(Xj,,Xj, J =EL(X,p - X,,.,)^] 

Fig. 2 shows a flowchart of the above-described process. In step 1000, the 

receiver 104 receives the input information sequence Yj^ and goes to step 1002. In 
step 1002, the receiver 104 selects an initial estimate for the decode output information 

sequence XJq and goes to step 1004, In step 1004, the receiver 104 generates Yt,ij(Xup ) 

where p = 0 for the first iteration and goes to step 1006. In step 1006, the receiver 104 

generates all the log (pij(Xt,Yt)) values and goes to step 1008. 

A, 

In step 1008, the recover 104 selects a sequence X^p^j that maximizes 
Q(Xj'^p^j,X^p) and goes to step 1010. In step 1010, the receiver 104 compares X^p with 
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X^p^j . If the compare result is within the compare threshold, then the receiver 1 04 goes 

to step 1012; otherwise, the receiver 104 returns to step 1004 and continues the process 

with the new sequence X^p^i . In step 1012, the receiver 104 outputs X^p^j and goes to 

step 1014 and ends the process. 

The efficiency of the above described iterative technique may be improved if the 
transmitted sequence is generated by modulators such as a trellis coded modulator 
(TCM). A TCM may be described as a finite state machine that may be defined by 
equations 1 0 and 1 1 shown below. 

St.i = /t(St, It) (10) 
Xt = &(S,I^ (11) 

Equation 10 specifies the TCM state transitions while eqxiation 1 1 specifies the 

transmitted information sequence based on the state and the input information sequence. 

For example, after receiving input It in state St, the finite state machine transfers to state 

St+i based on St and Ij as shown in equation 10. The actual output by the transmitter 102 

is Xt according to equation 1 1 . Equation 1 0 may represent a convolutional encoder and 

equation 1 1 may represent a modulator. For the above example, the transmitter output 

information sequence Xf may not be independent even if the input information 

sequence I ^ is independent. 

In equation 15, the log(Pij(Yt,Xt)) term may be analyzed based on the TCM state 

transitions because the information actually transmitted Xj is related to the soiirce 
information ^ by Xt = gt(St, It). This relationship between X^ and Ij forces many elements 
Py(Yt,Xt) of P (Yt,Xt), to zero since the finite state machine (equations 10 and 11) removes 
many possibilities that otherwise must be considered. Thus, unlike the general case 
discussed in relation to equations 5-9, evaluation of pjj(Yt,Xt) may be divided into a 
portion that is channel related and another portion that is TCM related. The following 
discussion describes the iterative technique in detail for the TCM example. 

For a TCM system with an independent and identically distributed information 
sequence, an input-output HMM may be described by eqixations 12 and 13 below. 

P(X-,Y,)=[p3,s,^P,(x. |yJ], (12) 
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where 



^ Pr(IJif S,,, = f,(S,J,) 

[0 otherwise ^ 

P(. (Yj I Xt) is the conditional PDM of receiving given that has been transmitted for 
the HMM of a medium (channel) through which the information sequence is 
transmitted; Ps s^^^ is the probability of the TCM transition from state to state Sj+i, and 
Pr(It) is the probability of an input I^. Thus, equation 2 may be written as 



1 

p,(I^Y,^)=7r,np,s,,Pc(Y.| X.)l. 



i=l 

where 71^ is a vector of the initial probabilities of the channel states, = g^{S^ and the 

product is taken along the state trajectory S^+i = ft(St ,1^) for t = 1, 2,...,T. 

If all elements of the input information sequence are equally probable, then the 
MAP estimate may be expressed by equation 14 below. 



i^=argmax7r,nP,(YjX,)l, (14) 



"1 t-i 

The auxiliary function may be expressed by equations 15-17 below corresponding to 

equations 7-9 above. 



T n n 



QOi JL) =Z S Z l°8(Pij(Y. I XO) + C (15) 

t=l i=i j=l 

where X^ = gt(St and 

YuidD-^iCYr I i;,p)MYjx,,)(3,(Y,L, 1 lL.p) (16) 
ai(Yj^'^ Ij p^) and p j (Y^^j p I^^^p) are the elements of the forward and backward 
probability vectors 

t 

a(Xl\)= 7U,nP,(YjX,)andp(Y,I, U P,(Y, |XJ1 (17) 



i=l 

From equation 15, the Viterbi algorithm may be applied with the branch metric 



V 

n p 



m(l,)=i; £ ygjfep)logp,,y(Y,|X.)l t=l,2,...,T (18) 
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to find a maximum of Q( l[ , I^p ) which can be interpreted as a longest path leading from 

the initial zero state to one of the states Sj where only the encoder trellis is considered. 
The Viterbi algorithm may be combined with the backward portion of the forward- 
backward algorithm as follows. 

1 . Select an initial source information sequence I = Ii,o ' ^ 2,0 ' - • ' ^ t,o 

2. Forward part: 



a. set a( Y" ij* ) = TT, where TT is an initial state probability estimate; 



and 



b, for t - 1 , 2, . . . compute p - g,{S^ 

a(Y/|i;,p) = a(Y/-Mi;:;)Pc(Yjx,p), 

where I{ p is a prior estimate of I{ . 

3. Backward part: 

a. set p (y^+1 I It+up ) 1 ^^^^ transition lengths L(Sj) to 0 

for all the states; 

fort = T,T-l, 1 compute: 

b. X, - g,(S, 

d. L(St)=max{L[ft(St J^lK^^Ot)}- TMs step selects the paths with 

the largest lengths (the survivors). 

e. Ij(St)=argmax{L[ft(St,It)]-i-m(lj)}. This step estimates 

corresponding to the state by selecting the 1^ of the survivor in step d. 

\ 

g. End (of "for" loop). 

4. Reestimate the information sequence: 
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It,p+i=It(St), Sj^i =ft(StJtp^jXt = U,...,T where Si=0;and 
5. If Ij ^ It p 5 go to step 2; otherwise decode the information sequence as 

I^ 

Figs, 3-6 show an example of the iterative process discussed above where there 
are four states in the TCM and T = 5. The dots represent possible states and the arrows 
represent a state trajectory that corresponds to a particular information sequence. The 
iterative process may proceed as follows. Firsts an initial input information sequence 

Il 0 is obtained, Ii o niay be the output of an existing decoder or may simply be a guess. 

The Viterbi algorithm together with the backward algorithm may be used to 

obtain a next estimate of the input information sequence if ^ , This process begins with 

the state transitions between t = 4 and t = 5 by selecting state transitions leading to each 
of the states s0-s3 at t = 4 from states at t = 5 that have the largest value of the branch 
metric L{S^) = m(l5) of equation 1 8 above. Then, the process moves to select state 
transitions between the states at t = 3 and t = 4 that have the largest cumulative distance 
L(S3) = L(S4) 4- mCIJ, This process continues until t = 0 and the sequence of input 
information if corresponding to the path connecting the states from t = 0 to t = 5 that has 

5 

the longest path L(So) = ^ m(It) is selected as the next input information sequence if j . 

t=i 

For the example in Fig, 3, state transitions from the states at t = 4 to all the states 
at t = 5 are considered. Assuming that the (IJs are binary, then only two transitions can 
emanate from each of the states at t = 4: one transition for I5 = 0 and one transition for I5 = 
1 . Thus, Fig. 3 shows two arrows terminating on each state at t = 4 (arrows are 
"backwards" because the backward algorithm is used). State transitions 301 and 302 
terminate at state sO; state transitions 303 and 304 terminate at state si; state transitions 
305 and 306 terminate at state s2; and state transitions 307 and 308 terminate at state s3. 

The branch metric m(IJ of equation 18 represents a "distance" between the states 
and is used to select the state transition that corresponds to the longest path for each of 
the states sO-s3 at t = 4: 
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°^(I5) =S ZY5.ij(Xf,o ) log Pu(X5,Y5) 

I J 

=Z Z «i(Xi'o ' Y,' )Pij(X5,o,Y3)Pj( X^o , Yi)log py(Y3 1 X5), (19) 

+ • 

i J 

where pj( q , ) = 1 , and X5 = 85(85, 15) by definition. There is an I5 that corresponds 

to each of the state transitions 301-308. For this example, L(S4) = m(l5) corresponding to 
odd numbered state transitions 301-307 are greater than that for even nnmbered state 

transitions 302-308. Thus, odd numbered state transitions are "survivors." Each of them 
may be part of the state trajectory that has the longest path firom t = 0 to t = 5, This 
transition (the survivor) is depicted by the solid arrow while the transitions with smaller 
lengths are depicted by dashed lines. 

The state sequence determination process continues by extending the survivors to 
t = 3 as shown in Fig. 4 forming state transitions 309-316. The distance between state 
transitions for each of the states are compared based on L(S4) + m(l4), where m(l^) is 
shown in equation 20 below. 

ni(l4) =Z Z Y5,U(X?.0 ) log Pij(X4,Y4) 

i J 

=X Z «i( K ' Y,' )Pij(X4.0,Y4)Pj( X,,o , Y, )log py(Y4 1 X4). (20) 

i j 

For this example, the distances corresponding to the odd numbered state transitions 309- 
3 1 5 are longer than distances corresponding to even numbered state transitions 310-316. 
Thus, the paths corresponding to the odd numbered state transitions are the survivors. As 
shown in Fig. 4, the state transition 301 is not connected to any of the states at t = 3 and 
thus is eliminated even though it was a survivor. The other surviving state transitions 
may be connected into partial state trajectories. For example, partial state trajectories are 
formed by odd numbered state transitions 307-309, 303-311, 303-313 and 305-315. 

The above process continues until t = 0 is reached as shown in Fig. 5 where two 
surviving state trajectories 320-322 are formed by the surviving state trajectories. All the 
state trajectories terminate at state zero for this example because, usually, encoders start 
at state zero. As shown in Fig. 6, the state trajectory that corresponds to the longest 

cumulative distance is selected and the input information sequence Ij (via S^+i = ft (St,It) 
that corresponds to the selected trajectory is selected as the next estimated input 
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information sequence Ij j . For this example, the state trajectory 320 is selected and the 

input information sequence 1\ corresponding to the state trajectory 320 is selected as ^ . 

Fig. 7 shows an exemplary block diagram of the receiver 104. The receiver 104 
may include a controller 202, a memory 204, a forward processor 206, a backward 
processor 208, a maximal length processor 210 and an input/output device 212, The 
above components may be coupled together via a signal bus 214. While the receiver 104 
is illustrated using a bus architecture, any architecture may be suitable as is well known to 
one of ordinary skill in the art. 

All the functions of the forward, backward and maximal length processors 206, 
208 and 210 may also be performed by the controller 202 which may be either a general 
purpose or special purpose computer (e.g., DSP). Fig. 7 shows separate processors for 
illustration only. The forward, backward maximal length processors 206, 208 and 210 
may be combined and may be implemented by using ASICs, PLAs, PLDs, etc. as is well 
known in the art. 

The forward processor 206 generates the forward probability vectors 
Ui (x|"p , Y/"^ ) herein referred to as a^. For every iteration, when a new Xj^p (or I^p ) is 

generated, the forward processor 206 may generate a complete set ofa^. 

The backward processor 208 together with the maximal length processor 210 
generate a new state sequence by searching for maximal length state transitions based on 
the branch metric m(It). Starting with the final state transition between states 

corresponding to t = T - 1 and t = T, the backward processor generates Pj(X^^i p , Y^^i ) 

(hereinafter referred as pj) as shown in equation 8 for each state transition. 

The maximal length processor 210 generates m(I^) based on the results of the 
forward processor 206, the backward processor 208 and py(Xt,Yt). After generating all 
the m(y s corresponding to each of the possible state transitions, the maximal length 
processor 210 compares all the L(St) + m(IJs and selects the state transition that 
corresponds to the largest L(St) + m(It), and the It (via S^+i = :^(St, y) that corresponds to 
the selected state transition is selected as the estimated input information for that t. The 
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above process is performed for each t = 1 , 2, T to generate a new estimate ij^ for each 
of the iteration p. 

Initially, the controller 202 places an estimate of the PDM P(X, Y) and n in the 

memory 204 that corresponds to the HMM for the channel 106 and/or the transmitter 102. 
The PDM P(X,Y) may be obtained via well known training processes, for example. 

When ready, the controller 202 receives the received input information sequence 

Yj^ and places them in the memory 204 and selects an initial estimate of IJq (or X^^). 

The controller 202 coordinates the above-described iterative process until a new estimate 

iJiioT X^i) is obtained. Then, the controller 202 compares I^q with I^^ to determine if 

the compare result is below the compare threshold value (e.g., matching a predetermined 
number of elements or symbols of the information sequence). The compare threshold 

may be set to 0, in which case I^q must be identical with 1^ , If an acceptable compare 

resuh is reached, I^j is output as the decoded output. Otherwise, the controller 202 

iterates the above-described process again and compares the estimated I^p v^th I^p_| 

until an acceptable result is reached and ij^ is output as the decoded output. 

Fig. 8 shows a flowchart of the above-described process. In step 1000, the 
controller 202 receives Yj^ via the input/output device 212 and places Y,'^ in the memory 
204 and goes to step 1002. In step 1002, the controller 202 selects an initial estimate for 
IJq and goes to step 1004. In step 1004, the controller 202 determines a new state 

sequence and a next estimated ij^ (I^p , where p = 1) (via the forward, backward and 
maximal length processors 206, 208 and 210) and goes to step 1006. In step 1006, the 
controller 202 compares I^o with I^ j . If the compare result is within the predetermined 
threshold, then the controller 202 goes to step 1008; otherwise, the controller 202 retums 
to step 1004. In step 1008, the controller 202 outputs l[p where p is the index of the last 

V 

iteration and goes to step 1010 and ends the process. 

Fig. 9 shows a flowchart that expands step 1004 in greater detail. In step 2000, 

it 

the controller 202 instructs the forward processor 206 to generate a; as shown in 
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equation 8, and goes to step 2002. In step 2002, the controller 202 sets the parameter 
t = T and goes to step 2004. In step 2004, the controller 202 instructs the backward 

processor 208 to generate Pj and the maximal length processor 210 to determine next set 

of survivors based on equation 1 8 and time t+1 survivors and goes to step 2006. 

In step 2006, the controller 202 decrements t and goes to step 2008. In step 2008, 
the controller 202 determines whether t is equal to 0. If t is equal to 0, the controller 202 
goes to step 2010; othenvise, the controller 202 returns to step 2004. In step 2010, the 

controller 202 outputs the new estimated I^ and goes to step 2012 and retums to 
step 1006 of Fig. 5. 

A specific example of the iterative process for convolutional encoders is enclosed 
in the appendix. 

While this invention has been described in conjunction with specific embodiments 
thereof, it is evident that many alternatives, modifications, and variations will be apparent to 
those skilled in the art. Accordingly, preferred embodiments of the invention as set forth 
herein are intended to be illustrative, not limiting. Various changes may be made without 
departing firom the spirit and scope of the invention. 

For example, a channel may be modeled as P,(Y | X) = PcB,(Y | X) where is a 
chaimel state transition probability matrix and B^CY | X) is a diagonal matrix of state 
output probabilities. For example, based on the Gilbert-Elliott model 

Be(X|X) = 

where X is the complement of X. For this case, m(y may be simplified as 

mat) = ^ Ym( ?p W 1 XJ, t = 1 , 2, T, and 

YmC Ijp ) = «i( Y; 1 11 M Y.I, I llu, )' where Y, 1 5g are the elements of 

B^. 



l-b, 0 
0 l-b 



andB,(X |X) = 



Z>i 0 
0 b. 
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WHAT IS CLAIMED IS : 

1 . A method for maximum a posteriori (MAP) decoding of an input 
information sequence based on a first information sequence received through a channel, 
comprising: 

iteratively generating a sequence of one or more decode results starting with an 
initial decode result; and 

outputting one of adjacent decode results as a decode of the input information 
sequence if the adjacent decode results are within a compare threshold, 

2. The method of claim 1 , wherein the iteratively generating comprises: 

a. generating the initial decode result as a first decode result; 

b. generating a second decode result based on the first decode result and a model 

of the channel; 

c. comparing the first and second decode results; 

d. replacing the first decode result vrith the second decode result; and 

e. repeating b-d if the first and second decode results are not within the compare 
threshold. 

3. The method of claim 2, wherein the generating a second decode result 
comprises searching for a second information sequence that maximizes a value of an 
auxiliary fimction. 

4. The method of claim 3, wherein the auxiliary function is based on the 
expectation maximization (EM) algorithm. 

5. The method of claim 4, wherein the model of the chaimel is a Hidden 

Markov Model (HMM) having an initial state probability vector 71 and probability density 

matrix (PDM) of P(X,Y), where XeX, YeY and elements of P(X,Y), 
Pij(X,Y) = PrO,X,Y I i), are conditional probability density functions of an information 
element X of the second information sequence that corresponds to a received element Y 
of the first infonnation sequence after the HMM transfers from a state i to a state j, the 
auxiliary function being e^^pressed as: 
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Q(Xj,Xl) = Y')log('^'(z.X^ Y,^)), where p is a number of 

2 

T 

iterations, T(z, X^, Yi^) = 7liJQpi^ ^i^(Xt,Yt),Tisanumberofinfom^ 

t=i 

in a particular information sequence, z is a HMM state sequence io . 7t is the probability 
of an initial state io,X^ is the second information sequence, Xj^ is a second information 

sequence estimate corresponding to a pth iteration, and is the first information 
sequence. 

6. The method of claim 5, wherein the auxiliary function is expanded to be: 

Q(^lK) = itt Yt4i(XL)log(p,(X.Y0) + C 

t=l i-1 j=l 

where C does not depend on Xj and 

y Oil )^a, (X t; , Yr )P, (X,p , Y, ) ft( XL,p , ) 
where a ^ (Xj p , Y^ ) and X^^j ^ , Y^^ ) are the elements of forward and backward 
probability vectors defined as 

a(X|, y;) = TiflPpC,, Yi), and ^(XJX) = flP(Xj, 1, 7t is an 

i=i i-t 

initial probability vector, 1 is the column vector of ones. 

7. The method of claim 6, wherein a source of an encoded sequence is a 

trellis code modulator (TCM), the TCM receiving a source information sequence ij and 
outputting X^ as an encoded information sequence that is transmitted, the TCM defining 
Xj = gt(St ,It) where and It are the elements of Xj and l[ for each time t, respectively, 
is a state of the TCM at t, and &(.) is a function relating X^, to 1^ and S^, the method 
comprising: 

* T 

generating, for iteration p+1, a source information sequence estimate Ij that 
corresponds to a sequence of TCM state transitions that has a longest cumulative distance 
L(St.i) at t = 1 or L(So), wherein a distance for each of the TCM state transitions is 

defined by L(St) = L(S^i) + ra(i,^, (St+i))for the TCM state transitions at each t for 
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t = 1, T and the cumulative distance is the sum of m(It (St)) for all t, m(l^ (SJ) being 
defined as 

m(it(S,))-i; X Yuj(tKPc,ij(Yt|Xt(S,))joreacht==l,2,.,.,T,whe 

i=i j=i 

Xt(St) = gt(St, it (St)), n^ is a number of states in an HMM of the channel and 

Pc,ij(Yt I Xt(St)) are channel conditional probability density functions of Yj when X^{S^ is 

transmitted by the TCM, I^ ^^i being set to a sequence of Ij for all t. 

8. The method of claim 7, wherein for each t = 1, 2, T, the method 
comprises: 

generating m( 1^ (SJ) for each possible state transition of the TCM; 

selecting state trajectories that correspond to largest L(St) = L(St+i) + m(l^^^ (S^+O) 
for each state as survivor state trajectories; and 

selecting 1^ {S^s that correspond to the selected state trajectories as It,jH-i(St). 

9. The method of claim 8, further comprising: 

a. assigning L(Sx)=0 for all states at t = T; 

b. generating m(it (Sj)) for all state transitions between states and all possible 
states St+i; 

c. selecting state transitions between the states St and St+i that have a largest 
L(St) - L(St^i ) + m( it^i (St^i)) and 1,^, (St,,) that correspond to the selected state 
transitions; 

d. updating the survivor state trajectories at states St by adding the selected state 
transitions to the corresponding survivor state trajectories at state S^+i; 

e. decrementing t by 1 ; 

f. repeating b-e until t - 0; and 

g. selecting all the I, (S^ that correspond to a survivor state trajectory that 

corresponding to a largest L(St) at t = 0 as Ij . 

1 0. The method of claim 6, wherein the channel is modeled as Pc(Y | X) = 
VJiJY I X) where is a channel state transition probability matrix and B,(Y | X) is a 
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diagonal matrix of state output probabilities, the method comprising for each t = 1 , 2, 
T: 

generating Yu( ijp ) = ai( Y/ I Ij^p M Y^, \ 1^,,^ ); 

selecting an 1^ (SJ that maximizes L(S t)=L(S t+i )+ ra(^t+i (St+i)), where m(i, (S^)) 
is defined as 

m( it (SJ) = £ Yt, i( I^p )Pj(Yt I X,(SJ), n, being a number of states in an HMM of 
the channel; 

selecting state transitions between states Sjand St+j that corresponds to a largest 

L(S J = L(S ) + m( i,,, (S,,0); and 

forming survivor state trajectories by connecting selected state transitions. 

1 1 . The method of claim 1 0, furttier comprising : 

selecting it (S^) that corresponds to a survivor state trajectory at t = 0 that has the 
largest L(St) as if for each pth iteration; 
comparing Ifp andlj^p^j ; and 

outputting I^p^i as the second decode result if I^p andlf p^j are vrithin the 

compare threshold. 

12. A maximum a posteriori (MAP) decoder that decodes a transmitted 
information sequence using a received information sequence received through a channel, 

comprising: 

a memory; and 

a controller coupled to the memory, the controller iteratively generating a 
sequence of one or more decode results starting with an initial decode result, and 
outputting one of adjacent decode results as a decode of the input information sequence if 
the adjacent decode results are within a compare threshold. 

1 3 . The decoder of claun 1 2, wherein the controller: 

a. generates the initial decode result as a first decode result; 

b. generates a second decode result based on the first decode result and a model 

of the channel; 
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5 c. compares the first and second decode results; 

6 d. replaces the first decode result with the second decode result; and 

7 e. repeats b-d until the first and second decode result are not within the compare 

8 threshold. 

1 14. The decoder of claim 13, wherein the controller searches for information 

2 sequence that maximizes a value of an auxiliary function. 

1 15. The decoder of claim 14, wherein the auxiliary function is based on 

2 expectation maximization (EM), 

1 1 6. The decoder of claim 1 5, wherein the model of the channel is a Hidden 

2 Markov Model (HMM) having an initial state probability vector 71 and probability density 

3 matrix (PDM) of P(X,Y), where XgX, YeY and elements of P(X,Y), 

4 PijP^?Y) = Pr(j,X,Y I i), are conditional probability density functions of an information 

5 element X of the second information sequence that corresponds to a received element Y 

6 of the first information sequence after the HMM transfers from a state i to a state j , the 

7 auxiliary function being expressed as: 

8 QQ^hKv) = S'^'(^^^Up' Y,^))' ^^^^^ P is ^ ^^^^^ 

z 

T 

9 iterations, ^(z, X[ , Yj^ ) = 7t ]^ p^^ (X, , Y^ ) , T is a number of information elements 

t=i 

10 in a particular information sequence, z is a HMM state sequence ij ^ ^ is the probability 

11 of an initial state io, Xf is the second information sequence, X^p is a second information 

12 sequence estimate corresponding to a pth iteration, and is the first information 

13 sequence. 

1 17. The decoder of claim 16, wherein the auxiliary function is expanded to be: 

2 Q(X^X^,) = ttt YMj(X^p) log (P,(X., YO) + C 

t=l i=l j=l 

3 where C does not depend onXj and 



Yui(X^p)=ai(X;:;,Yr)py(X,p,Y,)P,(X,\,p,Y.t.) 

where a ; (Xj p , ) and pj( X^^, p , Y,'^, ) are the elements of forward and backward 
probability vectors defined as 
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a( X; , - 71 n P(X J , , and p (X^^i , Y,^,i ) = n P(^^ 

probability vector, 1 is the column vector of ones. 

1 8. The decoder of claim 1 7, wherein a source of an encoded sequence is a 

trellis code modulator (TCM), the TCM receiving a source information sequence If and 
outputting as an encoded information sequence that is transmitted, the TCM defining 
= g^(St Jt) where X^ and It are the elements of X[ and for each time t, respectively, S, 
is a state of the TCM at t, and is a function relating X^, to 1^ and S^, the controller 
generates, for iteration p+1, an input information sequence estimate Ij p+, that 
corresponds to a sequence of TCM state transitions that has a longest cvimulative distance 
L(S,.i) at t = 1 or L(So), wherein a distance for each of the TCM state transitions is 

defined by L(S,+i) = L(S,+i) + m( (S,+i)) for the TCM state transitions at each t for 

A A 

t = 1, T and the cumulative distance is the sum of m(It(St)) for all t, m(Ij(St)) being 
defined as 

4.(S.))=i; t Yujfc>ogPc,ij(Y.|X.(S.)),foreacht=l,2,...,T, where 

i=i j=i 

Xt(St) = g,(S„ it (SJ), n^ is a number of states in an HMM of the channel and 

p„ i:(Y, I Xt(St)) are channel conditional probability density functions of Y, when X,(St) is 

A 

transmitted by the TCM, I^ p+i being set to a sequence of 1^ for all t. 

1 9. The decoder of claim 1 8, wherein for each t = 1 , 2, T, the controller 
generating m(i, (S^)) for each possible state transition of the TCM, selecting state 
trajectories that correspond to largest L(SO = L(S,+,) + m(i,^, (8,+,)) for each state as 
survivor state trajectories, and selecting 1,^, (S,+i)s that correspond to tiie selected state 

trajectories as I^i_p+i(Sy.,)- 

20. The decoder of claim 19, wherem the controller: 

a. assigns L{Sj )^ for all states at t = T; 

b. generates m{l^ (S"3) for all state transitions between states and all possible 
states Sj+i; 
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c. selects state transitions between the states and S^+j that have a largest 
L(St) - L(S,^j ) + m(it^i (St^O) and I^^^ (S,^i) that correspond to the selected state 

transitions; 

d. updates the survivor state trajectories at states by adding the selected state 
transitions to the corresponding survivor state trajectories at state St+i; 

e. decrements t by 1; 

f. repeats b-e until t = 0; and 

g. selects all the 1^ (S^) that correspond to a survivor state trajectory that 

corresponding to a largest L(S^ at t = 0 as ij'p^j . 

2 1 . The decoder of claim 20, wherein the channel is modeled as P^Y I X) = 
P,B^(Y 1 X) where is a channel state transition probability matrix and BXY 1 X) is a 
diagonal matrix of state output probabilities, for each t = 1, 2, T, the controller: 

generates y,i(I^p) = ai(Y/ I llM^lx i I'l.p); 

selects an 1^ (S^ that maximizes L(S ,)=L(S )+ m(i,,i (S,^,)), where m( 1^ (SJ) is 
defined as 

m( it (St)) = X Yt. i( I^p )Pj(Yt 1 Xt(St)), n, being a number of states in an HMM of 

1=4 

the channel; 

selects state transitions between states Stand S^^^ that corresponds to a largest 

L(St) = L(St.i) + ni(I,,,(St.i));and 

forms survivor state trajectories by connecting selected state transitions. 

22. The decoder of claim 21, wherein the controller selects I, (S J that 
corresponds to a survivor state trajectory at t = 0 that has the largest L(SO as I, for each 
pth iteration, compares l[p andl?"p^, , and outputs I, as the second decode result if ^ 

and I^p^i are within the compare threshold. 
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ABSTRACT OF THE DISCLOSURE 
This invention provides an iterative process to maximum a posteriori (MAP) 
decoding. The iterative process uses an auxiliary function which is defined in terms of a 
complete data probability distribution. The auxiliary fimction is derived based on an 
expectation maximization (EM) algorithm. For a special case of trellis coded modulators, 
the auxiliary function may be iteratively evaluated by a combination of forward-backward 
and Viterbi algorithms. The iterative process converges monotonically and thus 
improves the performance of any decoding algorithm. The MAP decoding minimizes a 
probability of error. A direct approach to achieve this minimization results in complexity 
which grows exponentially with T, where T is the size of the input. The iterative process 
avoids this complexity by converging on the MAP solution through repeated 
maximization of the auxiliary function. 
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Appendix 

MAP Decoding in Channels with Memory 

William Turin, Senior Member, IEEE 

Abstract 

The expectation-maximization (EM) algorithm is popular in estimating parameters of various 
statistical models. In this paper, we consider applications of the EM algorithm to the maximum 
a posteriori (MAP) sequence decoding assuming that sources and channels are described by 
hidden Markov models (HMMs). HMMs can accurately approximate a large variety of 
communication channels with memory and, in particular, wireless fading channels with noise. 
The direct maximization of the a posteriori probability (APP) is too complex. The EM algorithm 
allows us to obtain the MAP sequence estimation iteratively. Since each step of the EM 
algorithm increases the APP, the algorithm can improve performance of any decoding procedure. 

Keywords: Maximum a posteriori decodmg, EM algorithm, hidden Markov model, fading 
channel 
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1. INTRODUCTION 

Maximum a posteriori (MAP) sequence decoding is optimum, because it minimizes the 
probability of error. ^^^^ It is usually performed by the Viterbi algorithm and most recently by 
the turbo decoding algorithm for a special class of codes. These algorithms are directly 
applicable if communications channels are memoryless. However, they perform an approximate 
MAP decoding if channel errors are bursty which is the case in wireless commimications due to 
fading. It is usually very difficult to find a sequence MAP estimate directly for channels with 
memory. The expectation-maximization (EM) algorithm has been successfully applied by 
Georghiades and Han, Zeger and Kobayashi, ^^^^ and Georghiades '•^^ to decoding information 
transmitted over the fading wireless channel. In these papers, the fading channel is modeled by a 
complex Gaussian process. 

Alternatively, the wireless channel can be modeled by hidden Markov models (HMMs). 
[22,23] shown that HMMs are general enough to approximate not only fading, but also 

Other types of signal distortion such as interference and non-Gaussian noise. ^ In this paper, 
we demonstrate that the EM algorithm can be applied to MAP decoding if channel errors are 
described by an HMM. In our approach, the signal parameters are obtained by applying the EM 
algorithm to maximizing the a posteriori probability (APP) of the transmitted symbols. 

In developing the MAP decoding algorithm it is usually assumed that the conununication 
channel is memoryless which is achieved by interieaving. However, in the majority of real 
channels the error dependence extends over long time intervals. Complete independence is 
impossible to achieve due to the information delivery delay constraints and system memory 
limitations. We should also remember that a memoryless channel has lower capacity than a 
channel with memory with the same bit-error rate. ^'^'^^^ Therefore, it is important to develop 
decoding algorithms for channels with memory. 

There are many different models of channels with memory which correspond to various 
channel impairments such as fading, interference, and noise. All of them can be accurately 
approximated by hidden Markov models (HMMs). It can be shown that HMMs represent a dense 
family allowing us to approximate a large variety of stochastic processes. Their application in 
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3 



many diverse fields (such as speech, image, and handwriting recognition, experimental genetics, 
sociology, stock market modeling, etc) serves as experimental evidence of their generality. 
Another reason for their popularity is the relative simplicity of their use. 

In this paper we develop the MAP decoding algorithm for a general input-output HMM 
(lOHMM) which, as we will see, incorporates source and channel HMMs. For the special class 
of models, the decoding can be performed using the Viterbi algorithm. However, in the general 
case this algorithm gives only an approximate solution. We demonstrate that this approximate 
solution can be improved by the expectation maximization (EM) algorithm. The paper is 
organized as follows. In section 2, we discuss the relationship between various decoding criteria. 
In section 3, we consider the lOHMMs and their application to computing the transmitted 
symbol sequence APP. In section 4, we develop the MAP EM decoding algorithm and illustrate 
its application to decoding of block and convolutional codes. 

2. MAP DECODING 

Suppose that we have an input-output system whose input is described by a sequence of 
symbols X\ = (Xi ,Xt) and the corresponding output is Y\ ^ {YxJi. ^ ^ • .Yt\ Our 

goal is to choose the most probable input which produced the observed output l^i . 

The optimal estimator that maximizes the probability of X{ correct decoding is the maximum 

[271 

a posteriori (MAP) estimator: 

if = argmax PKJrf I y[) (0 

where Pr{-) denotes the corresponding probability or probability density function. Since the 
maximization does not depend on Y\, the MAP estimate can be obtained by maximizing the 
urmormalized APP 

i[ = argmax Pr{X{,Y\) = argmax Pr( Y\ \ X[)Pr{X{). (2) 

It follows from this equation that the maximum likelihood (ML) and MAP estimates coincide if 
the input is uniformly distributed. This assumption is often made when there is no information 
about the input probability distribution. ^'^'^ However, better results can be achieved if we 
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exploit the input (source) statistics. Since the ML estimate can be viewed as a special case of 
the MAP estimate, we consider only the latter in the sequel. 

It is often necessary to estimate a subset of the input sequence and in particular a single 
symbol: 

X, = argmaxPr(Xt,Y\). (3) 

In many applications the estimation must be performed before receiving the whole sequence 
yJ. If estimation ofX, is made on the basis of 7*1+^ with t>0, it is called a fixed lag smoothing; 
if T=0, it is called a filtering; if t<0, it is called a prediction. To solve the previous equations we 
need to develop algorithms for calculating the corresponding probability measures and their 
maximization. This is done by modeling the input-output system. 

3. fflDDEN MARKOV MODELS 

An input-output HMM \ = iS,XJ,ic,{?(x,y)}) is defined by its internal states 
S = {1,2,. ..,«}, inputs X, outputs Y, initial state probability vector n, and the input-output 
probability density matrices (PDM) ?{x,y), xeX, yeYv/hose elements Pijix,y) = Pr{J,x,y \ i) 
are the conditional probability density functions (PDF) of input x and corresponding output y 
after transferring from state i to state j . It is assumed that the state sequence S'o = {So,Si,...,S,), 
input sequence Jf, = (A", ,X2,...,X,), and output sequence Y'l = (Fi ,72, . . . , J',) possess the 
following Markovian property 

PriS,^„Y, I S^-',^r',11-') = Pr{S„X„Y, \ S,_i). 

According to this model, the PDF of the input sequence Xj and output sequence Y\ has the form 

[23] 

pK^f.rf) = s np(z,,7,) 1 (4) 

1 = 1 

where 1 is a column vector of n ones. 

If the source sequence is modeled by an autonomous HMM with states S\^^ and PDM 
P,(;c) = [PriS\'\x\S\%)]„_,„, and, for each input sequence, the channel is modeled by the 
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conditional HMM with states S\^^ and PDM PcOl^) = [PriS^'KylS'j'^i ,x)]„^^„^, then the 



combined lOHMM is described by the PDM of the form 



(5) 



where (g) denotes the matrix Kronecker product. In other words, the combined model is an HMM 
whose states represent all possible combinations of the transmitted sequence states and channel 



states. 



In the most popular model, the conditional output PDM has the form 



(6) 



where = l>y]n.,«, is the channel state transition probability matrix and Bc(Y\X) is a diagonal 
matrix of the state output probabilities. For example, according to the Gilbert-Elliott ^ ' ^ 
model, the channel has two states: "good" and "bad". In the good state, errors occur with small 
probability 6, while in the bad state they occur with larger probability 62- If we assume that the 
first state is good and the second state is bad, then the model is described by equation (6) with 



1-61 0 
0 1-62 



61 0 
0 62 



(7) 



where 1 denotes the complement of X The two-state model is the simplest HMM for channels 
with memory. Models with larger state space are often needed. ^^^'^^^ There are several HMMs 
for fading channels with additive white Gaussian noise (AWGN). In these models, the HMM 
states are usually associated with the channel fade levels while the state output conditional 

ri3 18 22 231 

probability density fimctions are Gaussian l > > ' J 



(8) 



Let us consider now the transmitted sequence modeling by HMMs. Usually, the source 
HMM is obtained by fitting it to experimental data. This process is called the HMM training. 
^'^^ If the transmitted sequence is generated by a trellis coded modulator (TCM), f^'^^'^^J then it 
can be described as a finite state machine: 
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After receiving an information symbol /, in state S^'^ the machine transfers to state 
S^t'^i = MS\'^ Jt) and outputs a symbols, = gt{SY^ Jtl The system is called TCM, because its 
state transition graph resembles a trellis whose nodes represent the states S^'^ for r = 1 ,2 , T and 
edges represent all possible state transitions. In a typical implementation, the first equation in 
(9) represents a convolutional encoder while the second equation represents a modulator. 

It follows from this description that the modulator output symbols are not independent, even 
if the source symbols are independent. From a statistical point of view, TCM can be described 
as a method of creating correlated sequences that are resistant to channel errors, allowing us to 
recover the source sequences with high reliability. It follows from the TCM definition that if the 
source symbols can be modeled by an HMM, the output process is also an HMM. 

4, MAP SEQUENCE DECODING VIA THE EM ALGORITHM 

If the channel is modeled by an lOHMM, the MAP estimate of the sequence Jr[ maximizes 
the right hand side of equation (4). However, as we can see, the direct maximization is a 
difficult problem. IfXt is a discrete variable, then we need to consider all possible sequences X\. 
Thus, the complexity grows exponentially with T. In the special case, when the input sequence 
X\ is uniquely determined by the sequence of states S[, the maximum can be found by the 
Viterbi algorithm. In this paper, we show that the general problem can be solved iteratively 
using the EM algorithm which, in our case, is a combination of the forward-backward and 
Viterbi algorithms. The EM algorithm converges monotonically, therefore, even single 
step of the EM algorithm allows us to improve performance of any decoding algorithm. 

To develop the algorithm, we define the complete data probability distribution as 
[23, Sec. 3.2.2] 

where z = ij is the HMM state sequence and PijiX.Y) are the elements of the matrix V(X,Y). 
The MAP sequence estimate of equation (2) can be obtained iteratively by the following EM 
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algorithm ^^'^^ 

^Up + i = argm^Q{X\ ,Xlp) , /?=0,1,2_ (U) 

where Q{X\,x\^p) is the auxiliary function which, in our case, can be written as 

Q{X\,Xlp) = 2v(z,Z[,p,7[)logv(z,Jr[,7[). 

Z 

Using the relationship between equations (10) and (4) we can rewrite the auxiliary function in 
the following form ^^^^ 

Q{X\^Xl,) = I i iY.,;(Jr[,,)logp,y(^„r,) + C (12) 
where C does not depend on X\ 

a,(Jf*i,p,yi) and ^i{Xj+i^p,Y]+\) are the elements of the following forward and backward 
probability vectors 

T 

a(J^i,l1) = TiYinXiJi) and p(J^i,l1) = UnXiJi)!^ (13) 

1=1 i^t 

It follows from these equations that the forward probability vectors can be evaluated by the 
forward algorithm 

a(Al,l1) = K, = o,{A'\n'')nXtJt) (14) 

and the backward probability vectors can be evaluated by the backward algorithm 

^{xi^xji^x) - 1, HxLYh = nxsJtmxix^yli)- (is) 

As we can see, the auxiliary function e(Jff,Jf[p) is much simpler than priXlJJ), In the 
majority of practical cases it is not difficult to find a maximum of the auxiliary function. For 
example, if source coded speech signal, which is modeled by an HMM with the mixture of 
Gaussian state-conditional PDFs, is transmitted over the slowly fading channel with additive 
Gaussian noise, it is possible to find the closed form solution of equation (1 1). ^^^^ However, the 
matrix PiX^Jt) size might be large. The previous equations dimensionality can be reduced if 
this matrix has a special form. 
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5. MAP DECODING OF TCM SIGNALS ON CHANNELS WITH MEMORY 

For a TCM system with an i.i.d information sequence, equation (5) takes the form 

YiX„Y,) = ip^-^^-^VAYt\X,)] (16) 



where 



Pril,) if S^i =A{S\'\I,) 
0 otherwise 



(17) 



which means that the matrix P(^, Jt)^^ sparse. In this case, we can write 

r 

Prill = ^cnW/>si;>,Pc(J^/ 

(=1 

Here and in the following equations, %c is a vector of the initial probabilities of the channel 
states, Xt = gf(^^^ ,//), and the product is taken along the state trajectory S^'^i = /,(S^'^ for 
r= l,2,...,r. Thus, for the TCM system, instead of a large sparse matrix P(^f ,7,), we can use 
smaller matrices P^(y, | X,) for computing the AFP. 

If we assume that all the information symbols are equiprobable, then the MAP estimate is 
equivalent to the ML estimate: 

The auxiliary function can be also written in terms of the smaller matrices: 

QillJlp) = i Ii;YM7(/L)IogPc.y(J^.i^r) + C (18) 

t = \ i = ly=l 

where = g;(5^^> ,7^) and 

jtjjiiip) = ai(ii-M/\:;);^c,/;(i^ri^.^)Py(yr.iUr.i,.) 

o^iiY^x'^ Uup) and P/rf+i \lL\^p) are the elements of the forward and backward probability 
vectors 

a(i1 Ifi) = n ^AYA^t) and p(l1 = U Pc(I',l^,)l (20) 
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which can be computed by the forward and backward algorithms, respectively. 

It follows from equation (18) that we can apply the Viterbi algorithm with the branch metric 

mUt) = ziyt.u(^lp)^^^Pc.ui^t\X^), ? = l,2,...,r (21) 

to find a maximum of Q(/[,/[,p) which can be interpreted as a longest path leading from the zero 
state to one of the states (Note that we are considering only the encoder trellis.) It is 
convenient to use the Viterbi algorithm in the backward direction to combine it with the 
forward-backward algorithm. In the forward direction, we compute and save in memory 
a(5l |/i,p) for f = l,2,...,r; in the backward direction, we compute recursively fi(Yf^\ \lh\,p) for 
^ = r- l,r-2,...,l, then we compute y^jj (/[^ ) and, for each encoder state at time t, determine the 
longest path starting from this state with branch metric m(I^) and the corresponding input 
sequence generating this path. Thus, the described EM algorithm can be summarized as follows: 

1. Select initial sequence /f,o = ^i.o Ji.o > • • • > ^r,o 

2. Forward part: 

Seta(l1|7?) = 7candforr = l,2,...,rcomputeX,,^ = gt{S?\lt,p\ 

oLiAUip) = <^iA-'\i\:p)fciYAx,^p) (22) 

3. Backward part: 

Setp(rf+i|/?.,i.p) = land the survivors ^"^hengths I (S^^^) = 0 for5V'^ = l,2,...,«,. 
For r = r,r-l,...,l computed, = gt(S^'\li\ 

y^ijUip) = aKr^^i/^:;)Pc,(;■(^^rl^^p)Py(^^^.llA'^l,p) 

L{S^'^) = max{L[/,(5i^\/,)] + m(/,)} 

4. Reestimate the information sequence: 

AAA 

It,p+\ = hiSt)^ St + \ = ftiStJt^p + \)y r=l,2,...,r 

where 5i = 0 
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5. lfltp^\i^lt^p, go to step 2; otherwise decode the information sequence as . 
Note that if the terminal state is forced to be zero, then we set 1(5^^) = -oo for S^^ ^0. 

It is clear from the algorithm's description that the number of operations required by the 
algorithm is k = NfNfB'Ny where Nfs is the number of operations required by the forward- 
backward algorithm, ^^^NyislhQ number of operations required by the Viterbi algorithm, ^"^^ and 
Nf is the number of iterations. The number of iterations is a random number which depends on 
the initial approximation of the decoded sequence. Because of the discrete nature of the decoded 
sequence, the number of iterations is usually small. In our simulation study, for the code 
described in Sec. 5.2 and channel of Ref. [10], we compared the EM algorithm with the 
exhaustive search (for short sequences). It took less than three iterations for the algorithm to 
converge for various initial guesses. 

.The forward-backward algorithm requires saving in memory all the forward probability 
vectors. The backward part of the algorithm starts only after the whole sequence Y\ has been 
received. This can cause a significant problem if the sequence is long. The problem can be 
solved by using an approximate forward-only fixed-lag algorithm. ^^^^ By combining the fixed- 
lag algorithm with the above EM algorithm it is possible to perform the decoding in the 
forward-only fashion. The combined algorithm lends itself to parallelization and pipelining. 

This algorithm can be applied to a general lOHMM. For the model special cases the branch 

metric can be simplified. In particular, if the model parameters are given by equation (6), we can 
[23] 

wnte 

milt) = I,Uiillp)bjiYt\Xt), t = U2.-J- (24) 

i = t 

where 

Ym(/[,p) = a,(l^i|/'u,)PKJ^f-.i|/r.i,p) 
For the fading channel with AWGN the algorithm can be simplified. Substituting equation (8) 
into equation (18) we conclude that the maximization of QiliJi.p) is equivalent to 
minimization of the following quadratic form 
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R(llJip) = i ZytA^l^) II Y,-a,X, II 2 (25) 
which can be accomplished by the Viterbi algorithm with the branch metric 

m(It) = iJtAllp) II Y,-aiX, II ^ r = l,2,...,r. (26) 
For a PSK modulated sequence, we can write 

R(llllp) = -2 2 ZytA^lp)Ro{Xj:a,} + C (27) 

where C is independent of /[ . Thus, we can use a simpler metric 

fi(/r) = "iy^A^lpW^J^^i} (28) 

in the Viterbi algorithm. 

To improve the algorithm's convergence rate, it is important to select a good initial 
approximation of the decoded sequence. This can be achieved by using one of the suboptimal 
decoding algorithms. For example, we can use the symbol MAP estimate according to equation 

ri5i 

(3), the Viterbi algorithm for the most probable path, ^ or an algebraic decoding algorithm. 

To illustrate the algorithm applications, we consider two examples. 

5.1 Map Decoding of Block Codes 

It is well known that block codes can be interpreted as trellis codes in the following 
way. Let H = [ Aj h2 Ajy ] be a parity check matrix of a linear {N,K) code. Define the 
encoder states recursively as 

So =0, 5, = 5,_i + I,h,. (29) 

N 

Since a codeword ![ is in the null-space of H (/i^H = ^^tht = 0), the trellis encoder needs to 

keep only the trajectories leading to state S/^=0, If the code is systematic, the first K symbols are 
the information symbols if giveaby the source. The parity check symbols I^+i are uniquely 
defined by the path leading from state S^: to state S^=0. If the source is binary, there are only 
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two possible transitions from the states corresponding to information bits. 

Equation (29) has the form of the first equation of (9). In block-code-based TCM, segments 
of a codeword are mapped into the modulator constellation. If we have a discrete channel, 
the modulator is considered to be a part of the channel and the encoded symbols are transmitted 
directly to the channel. In this case, the second equation of (9) takes the iormXt = If Thus, both 
equations (9) are satisfied and we can apply the EM decoding algorithm described in this section 
if the channel is modeled by an HMM. 

To be more specific, consider a (5,3) block code with the parity check matrix 



H — 



110 10 
0 110 1 



The code trellis diagram is depicted in Fig. 1. This code has four states: (00), (01), (10), and 
(11). The encoder output bits mark the corresponding state transition edges: the (horizontal) 
transitions between the same states correspond to 0 and transitions between different states 
correspond to 1. For example, for the &st observation y, we have h , = (1,0)' which is the first 
column of the matrix H. 

Suppose that the channel is described by the Gilbert-Elliott model represented by 
equations (6) and (7). According to these equations, we have 



P(0!0) = P(l!l) = 



/'2l(l-*l) P22(l-*2) 



P(0|1) = P(1|0) = 



Let us assume that the sequence Ff = 01010 was received. As an initial approximation of the 
decoded sequence we choose the closest to the received sequence Y\ codeword = 0101 1- 



For this codeword, we compute 



a(}',|0)=«,Pc(0l0),a(7i|01)=a(r,l0)P,(l|l).a(rl|010) = a(r?|01)P,(0|0) 
a(l1|0101) = a(rl|010)P,(l|l), a(rflOlOll) = a(yt|0101)P,(0|l). 



This completes the forward part. In the backward part, we compute 
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L(54 = 00) = miXs=0) = Y5,ilog(l-*i)+y5.2log(l-62) 
Z,(S4=01) = m(X5 = l) = Y5,ilog^'i+75,2log62 

whereys,, = a,(rf 101011)P,- and3, = l. Then we compute 3(75 1 1) = P<,(0|l)land 
Z,(53=00) = miX4 = 0) + m{Xs=0), L(S3=01) = miX^ = 0) + m{Xs = l) 
1(53 = 10) = m{X4 = l) + m{X5=0), L(S3 = 11) = ^(^4 = l) + m(X5 = 1). 

In the next step, we need to apply the Viterbi algorithm: Choose A's = 0 if 

i(52=00) = I(53=00)+m(Z3=0) > I(53=01) + /n(X3 = l). 

Otherwise, we choose A'3 = l and I(S2=00) = I(53=01)+m(^3 = l), and so on. At the end of 
this iteration, we find the longest path connecting the nodes at f=0 and r=5. The corresponding 
encoder output sequence is the next approximation of the decoder output. The process continues 
until two consecutive iterations deliver the same result. 

5.2 Map Decoding of Convolutional Codes 

As a second example, let us consider a convolutional code. The encoder operation can be 
described by the following equations 

S^^+>, = S\'^A + I,B 

where 5, is the encoder state vector and 2 ^ is the encoder output vector. The encoder initial state 
is usually selected as 5 1 =0. 

In the TCM standard implementation, the encoded symbols are transformed by a memoryless 
mapper to produce a symbol in the modulator constellation: 

X, = F(E,) - F(5l^>C + /,G) = MS^'Kltl (31) 

As we can see, equations (30) and (31) have the form of equations (9). If we assume that 
symbols are transmitted over a channel represented by equation (6), then we can apply the 
EM algorithm described in this section. 

To be more specific, suppose that the source is binary, memoryless, and symmetric 
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(p(0) = = 0.5). The source bits are coded by the rate 1/2 convolutional encoder shown in 
Fig. 2. 



If we denote the encoder output vector as 2, = ^,2]. then according to this figure we 



have 



(32) 



where the encoder state is defined by the contents of its shift registers which is (see Fig. 2) 
S\''> = Ui-iJt-i] aiid its state transition probability matrix has the form 



P. = 



0.5 0 0.5 0 

0.5 0 0.5 0 

0 0.5 0 0.5 

0 0.5 0 0.5 



The encoder output symbols are mapped to the QPSK constellation which consists of four 
symbols: ;^(«> = {-A,-A), X^'^ = {-A,A), X^^^ = {A -A), and X^'^ = (A^A). Equation (16) 



takes the form 



p(/=o,y)=o.5 



p,(7|;^*«) 0 00 

p,(y|^'>) 0 00 

0 r^{Y\x^^^) 0 0 

0 ?,iY\x^^^) 0 0 



P(/=l, 10=0.5 



0 0 ?ciY\X^^'>) 0 

0 0 f,{Y\X<^'>) 0 

0 0 0 p<,(y|A^") 

0 0 0 ?c(Y\X^^^) 



(33) 



Similarly to the previous example, we can apply the EM algorithm of this section to decoding of 
a terminated convolutional code. If a code sequence is too long, an approximate fixed-lag 
algorithm can be applied in a forward-only fashion, ^^^hf the channel is described by equation 
(6), we can apply the EM algorithm whose maximization part is performed using the Viterbi 
algorithm with the branch metric (24). 

However, if a bit-level model is described by equation (6), it does not mean that the encoded 
symbol model has this form. In this case we need to apply a more complex metric (21). To be 
more specific, suppose that the channel is described by the Gilbert-Elliott model with parameters 

[3,10] 
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Pc = 



Thus, according to equation (6), we have 



Pc(0) = 



'0,996003 0.002520] p fn = 
.0.033966 0.811440J' ^ 



0.000997 0.000480 
L 0.000034 0.154560. 



Suppose that the code starts and terminates at the state 00 and has three information bits. The 
encoder trellis diagram along with its output/input symbols are depicted in Fig. 3. For every 
information bit, the encoder produces a two-bit symbol which is presented in decimal form in 
Fig. 3. Thus, we have the following matrix probabilities of errors in the received two-bit 
symbols: 



P(0) = 
P(2) 



- [0-992108 0.004555] p/i>| 
'^^^ ~ LO. 061392 0.658520J' ^ ^ 

- T» n\i> - [0.001009 0.000392] 

- ^c{n^c(^) - [0.005284 0.125416J' 



: P,(0)P,(1) = [ 
P(3) = P?(l) 



0.000993 0.000868] 
0.000061 0.1 25432 J* 



„ [ 0.000001 0.000075] 
" L 0.000005 0.023 889 J' 



These matrices do not satisfy equation (6), therefore, we cannot use the simplified metrics as in 
the previous examples and must use a more complex metric (21). Equation (16) takes the form 



p(/=:0,r)=o.5 



?^{Y) 0 0 0 

p,(r©3) 0 0 0 

0 P,(Y®2) 0 0 

0 Pc:(i'©l) 0 0 



p(/=i,r)=o.5 



0 0 p,(r®3) 0 

0 0 Pc(n 0 

0 0 0 Pc(y®^) 

0 0 0 Pc(>'©2) 



(46) 



where ® denotes bitwise exclusive-or operation. Suppose that we received Y\ = (0,3,2,1,0). 
As an initial guess for the transmitted codeword we choose Xq = 0,0,0,0,0. Using the 
algorithm described on page 9, we can find a maximum of QiliJup)- Obviously, it is 
equivalent to finding a minimum of -Q{l\jlp) which can be achieved using the Viterbi 
algorithm by finding the shortest path on the trellis using negative metric in equation (21). We 
obtained after first iteration the following lengths of the shortest paths connecting each node of 
the trellis with its terminal node: 



state 


t=l 


t = 2 


t = i 


t=4 


f = 5 


0 


0.074-10-^ 


0.0618- 10"' 


0.0783-10 ' 


0.0443- IQ-" 


O.OIOMQ-' 


1 






0.0784- 10 " 


0.0442-10 ' 


0.073 1 10 ' 


2 




0.0896- 10"^ 


0.051 1-IQ-' 


0.1344-10 ' 




3 






0.0511 10-^ 


0.0800-10 " 
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The decoded sequence /i = 0,1,0,0,0 and the corresponding transmitted sequence 
x\ = 0,3*2,3,0 After second iteration we obtained the following length 



state 




t = 2 


t = 3 


f = 4 


f = 5 


0 


0.0248 


0.0229 


0.0344 


0.0177 


0.0020 


1 






0.0341 


0.0171 


0.0289 


2 




0.0382 


0.0204 


0.0567 




3 






0.0204 


0.0319 





and the same decoded sequence as in the previous iteration = 0,1,0,0,0. 

As a test, we performed an exhaustive search which gave the same result with a maximum 
likelihood value of 0.940201 . For the HMM fading channel with the AWGN we can use metric 
(28). 

6. CONCLUSION 

We have demonstrated that if an information source, encoder, and communication channel 
are modeled by lOHMMs, then MAP decoding can be realized using the EM algorithm. The 
expectation part of the algorithm is performed using the forward-backward algorithm while the 
maximization part is accomplished using the Viterbi algorithm. The EM algorithm is robust and, 
in contrast with some other iterative algorithms, it converges to the APP maximum in all 
practical cases. Because the set of transmitted symbols is discrete, the number of necessary 
iterations is usually small which was confirmed by a direct simulation. 

7. ACKNOWLEDGMENT 

Helpful comments and suggestions of the associate editor Dr. Steven S. Pietrobon and the 
anonymous reviewers are gratefully acknowledged. 

References 

[I] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, "Optimal decoding of linear codes for minimizing 
symbol enror rate," IEEE Trans, Inform, Theory, vol IT-20, pp. 284-287, Mar. 1974, 

[2] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via 



Turin: MAP decoding in channels with memory 



the EM algorithm," J. R. Statist 5oc.,voL 76, 1977, pp 341-353. 

[3] E. O. Elliott, "Estimates of error rates for codes on burst-noise channels," Bell Syst, Tech. J., vol. 
42, Sept 1963, pp. 1977-1997. 

[4] G.D. Forney, Jr., 'The Viterbi algorithm," Proc, IEEE, vol. 61, pp. 268-278, Mar. 1973. 

[5] G.D. Forney, Jr., "Convolutional codes I: algebraic structure," IEEE Trans. Inform. Theory, vol 
IT-I6, Nov. 1970, pp. 720-738. 

[6] G.D. Forney, Jr., R.G. Gallager, G.R. Lang, F.M. Longstaff, and S.U. Qureshi, "Efficient 
modulation for band-limited channels," IEEE J. Select. Areas Commun,, vol. SAC-2, pp. 632-647, 
Sept. 1984. 

[7] J. Garcia-Frias and J. D. Villasenor, "Exploiting binary Markov channels with unknown 
parameters in turbo coding," GLOBECOM'98 Conf Record Sydney, 1998, pp. 3244-3249. 

[8] C. N. Georghiades and J. C. Han, "Sequence estimation in the presence of random parameters via 
the EM algorithm," IEEE Trans, Comm,, vol 45, no. 3, March, 1997. 

[9] C. N. Georghiades, "Maximum-likelihood detection in the presence of interference,"Proc. IEEE 
Symp. Inf. Theory, Aug. 1998, p. 344. 

[10] E. N. Gilbert, "Capacity of a burst-noise channel," Bell Syst. Tech. 1, vol. 39, Sept. 1960, pp. 
1253—1266. 

[11] H. Imai and S. Hirakawa, "A new multilevel coding method using error-correcting codes," IEEE 
Trans. Inform. Theory, col IT-23, pp. 371-377, May 1977. 

[12] L. N. Kanal and A. R. K. Sastry, "Models for channels with memory and their applications to error 
control," Proc. of the IEEE, vol. 66, no. 7, July 1978. 

[13] J. H. Kang, W. E. Stark, and A. O. Hero, "Turbo codes for fading and burst channels," 
GLOBECOM '98 Communications Miniconference, Sydney, 1998, pp, 183-188. 

[14] P. Ligdas, W. Turin, and N. Seshadri, "Statistical methods for speech transmission using hidden 
Markov models," in Proc. 31th Conf on Inf Set and Sys., pp. 546-551, March 1997. 



Turin: MAP decoding in channels with memory 1 8 

[15] N. Merhav and Y. Ephraim, "Hidden Markov modeling using a dominant state sequence with 
applications to speech recognition,'* Computer, Speech and Language, vol. 5, 1991, pp. 327-339. 

[16] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood 
Cliffs, New Jersey, 1993. 

[17] S.O. Rice, "Distribution of the duration of fades in radio transmission: Gaussian noise model," 
BellSyst. TechJourn,, vol 37, May 1958, pp. 581-635. 

{18] M. Sajadieh, F.R. Kschischang, and A. Leon-Garcia, "A block memory mode 1 for correlated 
Rayleigh fading channels," Proc. IEEE Int. Conf. Commun., June 1996, pp. 282-286. 

[19] S. Sivaprakasam and K. S. Shamnugan, "A forward-only recursion based HMM for modeling 
burst errors in digital channels," GLOBECOM, 1995. 

[20] F. Swarts and H.C. Ferreira, "Markov characterization of channels with soft decision outputs," 
IEEE Trans. Commun., vol. 41, May 1993, pp. 678-682. 

[21] W. Turin, "MAP symbol decoding in channels with error bursts," in Proc, 33 Conf on Inf ScL 
andSys,, pp. 674-679, March 1999. 

[22] W. Turin and R. van Nobelen, "Hidden Markov modeling of flat fading channels," IEEE JSAC, 
vol. 16, no. 9, Dec. 1998, pp.1809-1817. 

[23] W. Turin, Digital Transmission Systems: Performance Analysis and Modeling, McGraw-Hill, 
New York, 1998. 

[24] G. Ungerboeck, "Channel coding with multilevel phase signals," IEEE Trans. Inform. Theory, col 
IT-28, pp.55-67, Jan. 1982. 

[25] H.S. Wang and N. Moayeri, "Finite-state Markov channel - a useful model for radio 
communication channels," IEEE Trans. Veh. TechnoL vol. 44, Feb. 1995, pp.163-171. 

[26] J. K. Wolf, "Efficient maximum likelihood decoding of linear block codes using trellis," IEEE 

Trans. Inform. Theory, vol IT-24, pp. 76-80, Jan. 1978, 

[27] J. M. Wozencraft and I. M, Jacobs, Principles of Communication Engineering, John Wiley & 
Sons, New York, pp. 213-214, 1965. 



Turin: MAP decoding in channels with memory 



[28] L. M. Zeger and H. Kobayashi, "Sequence estimation in GSM using the EM algorithm," Conf 
Inform. ScL and Syst,^ March, 1998. 



